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Abstract — We consider a convex unconstrained opti- 
mization problem that arises in a network of agents 
whose goal is to cooperatively optimize the sum of 
the individual agent objective functions through local 
computations and communications. For this problem, 
we use averaging algorithms to develop distributed 
subgradient methods that can operate over a time- 
varying topology. Our focus is on the convergence rate 
of these methods and the degradation in performance 
when only quantized information is available. Based on 
our recent results on the convergence time of distributed 
averaging algorithms, we derive improved upper bounds 
on the convergence rate of the unquantized subgradient 
method. We then propose a distributed subgradient 
method under the additional constraint that agents can 
only store and communicate quantized information, and 
we provide bounds on its convergence rate that highlight 
the dependence on the number of quantization levels. 

I. Introduction 

There has been much interest in developing dis- 
tributed methods for optimization in networked- 
systems consisting of multiple agents with local in- 
formation structures. Such problems arise in a variety 
of environments including resource allocation among 
heterogeneous agents in large-scale networks, and 
information processing and estimation in sensor net- 
works. Optimization algorithms deployed in such net- 
works should be completely distributed, relying only 
on local observations and information, and robust 
against changes in network topology due to mobility 
or node failures. 

Recent work [15] has proposed a subgradient 
method for optimizing the sum of convex objective 
functions corresponding to n agents connected over a 
time-varying topology (see also the short paper [14]). 
The goal of the agents is to cooperatively solve the 
unconstrained optimization problem 

minimize Y.'l=ifi[x) 

subject to X e R™, ' 
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where each /; : M'" ^ M is a convex function, 
representing the local objective function of agent i, 
and known only to this agent. The decision vector x 
in problem ([T]i can be viewed as either a resource 
vector whose components correspond to resources 
allocated to each agent, or a global decision vector 
which the agents are trying to optimize using local 
information. The proposed method builds on the 
work in [23], [24] (see also, [3]). It relies on every 
agent maintaining estimates of an optimal solution 
to problem ([T]i, and communicating these estimates 
locally to its neighbors. The estimates are updated 
using a combination of a subgradient iteratiorQ and an 
averaging algorithm. The subgradient step optimizes 
the local objective function while the averaging algo- 
rithm is used to obtain information about the objective 
functions of the other agents. 

In this paper, we consider the distributed subgradi- 
ent method discussed in [15], and provide improved 
convergence rate results. In particular, we use our 
recent results on the convergence time of averaging 
algorithms [13] and establish new upper bounds on 
the difference between the objective function value 
of the estimates of each agent and the optimal value 
of problem ([T]i. These bounds have a polynomial 
dependence on the number of agents n (in contrast 
with the error bounds in [15], [14], which involve 
exponential dependence on n). Furthermore, we study 
a variation of the distributed subgradient method in 
which the agents have access to quantized informa- 
tion, and provide bounds on the convergence time that 
contain additional error terms due to quantization. 

In addition to the papers cited above, our work is 
related to the literature on reaching consensus on a 
particular scalar value or on computing exact averages 
of the initial values of the agents, a subject moti- 
vated by natural models of cooperative behavior in 

'For subgradient methods see, for example, [19], [21], [20], [8], 
[1], [2]. 



networked-systems (see, e.g., [25], [9], [4], [16], [5], 
and [17], [18]). Closely related is also the work in [10] 
and [7], [6], which study the effects of quantization on 
the performance of averaging algorithms. Our work 
is also related to the utility maximization framework 
for resource allocation in networks (see [11], [12], 
[22]). In contrast to this literature, however, we allow 
the local objective functions to depend on the entire 
resource allocation vector. 

The rest of this paper is organized as follows. In 
Section [III we describe the distributed subgradient 
method and present an improved convergence rate 
estimate using our recently established bounds on 
the convergence time of our averaging algorithms 
[13]. In Section we consider a version of the 
method under the additional constraint that the agents 
can only exchange quantized information. We provide 
convergence and rate of convergence results as a 
function of the number of quantization levels. Section 
HV] contains our concluding remarks. 
Notation and Basic Notions. We view all vectors 
as columns. We use Ci to denote the vector with zth 
entry equal to 1 and all other entries equal to 0. We 
use 1 to denote a vector with all entries equal to 1. 
For a matrix A, we use or [A]ij to denote the 
matrix entry in the ith row and jth column. We write 
[A]i and [A]^ to denote respectively the ith row and 
the jth column of a matrix A. A vector a is said 
to be a stochastic vector when its components are 
nonnegative and = 1- A square matrix A is said 

to be stochastic when each row of A is a stochastic 
vector, and it is said to be doubly stochastic when 
both A and its transpose A' are stochastic matrices. 

For a convex function F : R"' M, we use the 
notion of a subgradient (see [2]): a vector sf{x) E 
is a subgradient of a convex function F at x if 

F{x) + spixYix ~ x) < F{x), for all x. 

We use the notation f{x) = X]i=i fi{x)- We 
denote the optimal value of problem }T]i by /* and 
the set of optimal solutions hy X* . 

II. Distributed Subgradient Method 

We first introduce our distributed subgradient 
method for solving problem ([T]i and discuss the 
assumptions imposed on the information exchange 
among the agents. We consider a set V = {I, ... ,n} 
of agents. Each agent starts with an initial estimate 
Xi{0) £ M™ and updates its estimate at discrete 
times tk, k ~ 1,2, . . .. We denote by Xi{k) the vector 
estimate maintained by agent i at time tk. When 
updating, an agent i combines its current estimate Xi 
with the estimates xj received from its neighboring 



agents j. Specifically, agent i updates its estimates by 
setting 

n 

Xi{k + 1) = '^^aij{k)xj{k) — adi{k), (2) 

where the scalars aii{k), . . . , ai„(fc) are nonnegative 
weights and the scalar a > is a stepsize. The vector 
di{k) is a subgradient of the agent i cost function 
fi{x) at a; = Xi{k). We use the notation A{k) to 
denote the weight matrix [ay(fc)]ij=i_..._„. 

The evolution of the estimates Xi{k) generated by 
Eq. dill can be equivalently represented using tran- 
sition matrices. In particular, we define a transition 
matrix $(fc, s) for any s and k with fc > s, as follows: 

$(fc, s) = A{k)A{k - 1) • • • A(s + l)A{s). 

Using these transition matrices, we relate the estimate 
Xi{k + 1) to the estimates xi{s), . . . ,x„(s) for any 
s < fc. In particular, for the iterates generated by Eq. 
©, we have for any i, and any s and k with k > s, 

n 

x,{k + l) = ^[^{k,s)],jXj{s) 
i=i 

fe-l Tl 

- a^^[$(fc,r + l)],,d,(r) 

r—s j — 1 

- ad,{k) (3) 

(for more details, see [15]). As seen from the pre- 
ceding relation, to study the asymptotic behavior 
of the estimates a;'(fc), we need to understand the 
behavior of the transition matrices $(fc, s). We do this 
under some assumptions on the agent interactions that 
translate into some properties of transition matrices. 

Our first assumption imposes some conditions on 
the weights aij{k) in Eq. (|2]i. 

Assumption 1: For all k > 0, the weight matrix 
A{k) is doubly stochastic with positive diagonal. 
Additionally, there is a scalar j] > such that if 
aij{k) > 0, then aij{k) > -q. 

The doubly stochasticity assumption on the weight 
matrix will guarantee that the subgradient of the 
objective function fi of every agent i will receive 
the same weight in the long run. The second part of 
the assumption states that each agent gives significant 
weight to its own values and to the values of its 
neighbors. 

At each time k, the agents' connectivity can be rep- 
resented by a directed graph G{k) ~ {V,£{A{k))), 
where £{A) is the set of directed edges (j, i), in- 
cluding self-edges {i, i), such that > 0. Our 
next assumption ensures that the agents are connected 
frequently enough to persistently influence each other 



Assumption 2: There exists an integer B > 1 such 
that the directed graph 

(v,£{A{kB))[J---\j£{A{{k + l)B~l)) 

is strongly connected for all fc > 0. 

A. Preliminary Results 

Here, we provide some results that we use later in 
our convergence analysis of method These results 
hold under Assumptions [T] and |2] 

Consider a related update rule of the form 

z{k + \) = A{k)z{k), (4) 
where z(0) G M" is an initial vecto:@. Define 

n 

V{k)^Y^{zj{k)^z{k)f forallfc>0, 

where z{k) is the average of the entries of the vector 
z{k). Under the doubly stochasticity of A{k), the 
initial average z(0) is preserved by the update rule 
(|4]l, i.e., z{k) = z(0) for all k. Hence, the function 
V{k) measures the "disagreement" in agent values. 

In the next lemma, we give a bound on the decrease 
of the agent disagreement V{kB), which is linear in 
rj and quadratic in n^^ . This bound is an immediate 
consequence of Lemma 5 in [13]. 

Lemma 1: Let Assumptions [T] and |2] hold. Then, 
V{k) is nonincreasing in fc. Furthermore, 



V{{k + 1)B) < 1 - 



JL 

2^2 



V{kB) for all fc > 0. 



Using Lemma [T] we obtain the following result for 
the transition matrices <&(fc, s) of Eq. (O. 

Corollary 1: Let Assumptions [T] and |2] hold. Then, 
for all i,j and all k,s with fc > s, we have 



< 1 



M-2 



Proof: By Lemma [T] we have for all k > s, 

VikB)<[l-^y'' VisB). 
Let fc and s be arbitrary with k > s, and let 

tB<s<{t + 1)B, tB <k < {t + l)B, 

with T < t. Hence, by the nonincreasing property of 
V{k), we have 



V{k) 



< 
< 



V{tB) 

l-^T^"' V{{T + l)B) 



^This update rale captures the averaging part of Eq. 15). as it 
operates on a particular component of the agent estimates, with 
the vector z{k) € R" representing the estimates of the different 
agents for that component. 



< 



1 



V{s) 



Note that fc-.s < {t-T)B+B implying that ^~^+^ < 
t — T + 1, where we used the fact that both sides of 
the inequality are integers. Therefore f-^^^^g^] — 2 < 
t ~ T — 1, and we have for all fc and s with fc > s, 



(5) 



By Eq. ©, we have z(k + 1) = A{k)z{k), and 
therefore z(fc+l) = ^'(fc, s)z{s) for all k > s. Letting 
z{s) ~ Ci we obtain ^(fc+l) = [$(fc,s)]\ Using the 
inequalities (|5]l and V{ei) < 1, we obtain 



Vmk,s)Y)<il 



2n? 



M-2 



The matrix (E>(fc, s) is doubly stochastic, because it is 
the product of doubly stochastic matrices. Thus, the 
average entry of [$(fc,s)]i is 1/n implying that for 
all i and j. 



< vmk,s)Y) 



< 



1 



2n2j 



From the preceding relation and -^/l — rj/{2-n?) < 
1 — ij/iAn^), we obtain 



< 1 



V y 



B. Convergence time 

We now study the convergence of the subgradient 
method (|2]i and obtain a convergence time bound. 
We assume the uniform boundedness of the set of 
subgradients of the cost functions at all pointfl 
for some scalar L > 0, we have for all x £ R™ and 
all i, 

\\g\\ < L for all g e dMx), (6) 

where dfi{x) is the set of all subgradients of fi at x. 

We define the time-averaged vectors ii(fc) of the 
iterates Xi{k) generated by Eq. (|2]i, i.e.. 



fc-i 



(7) 



/i=0 



The use of these vectors allows us to bound the 
objective function improvement at every iteration; 

^^This assumption can be relaxed, see [15]. 



see [15], [14]. Under the subgradient boundedness 
assumption, we have the following resulfl 

Theorem 1: Let Assumptions [T| and |2] hold, and 
assume that the set X* of optimal solutions of prob- 
lem ([T]i is nonempty. Let the sets of subgradients be 
bounded as in Eq. Also, let the initial vectors 
Xi{0) in Eq. ^ be such that maxi<i<„ ||xi(0)|| < 
aL. Then, the averages Xi{k) of the iterates obtained 
by the method (|2]i satisfy 



n dist^(y(0),X*) 
2ak 

C 

— + 2anL^Ci, 



where 



C= l + 8nCi, Ci = 1 



nB 



JL 

4n2' 
(8) 



andy(0) = (l/n)Er=i^«(0). 

Proof: The proof is identical to that of Proposi- 
tion 3 in [15] and relies on the use of our improved 
convergence rate bound in Corollary [T] ■ 
The convergence rate result in the preceding the- 
orem improves that of Proposition 3 in [15], where 
an analogous estimate is shown with a worse value 
for the constant /3. In particular, there the constant 
13 in [15] is given by /3 = 1 - 7y("-i)'B, and Ci 
increases exponentially with n. As seen from Eq. ©, 
our new constant Ci increases only polynomially with 
n, indicating a much more favorable scaling as the 
network size increases. 

111. Quantization effects 

We next study the effects of quantization on the 
convergence properties of the subgradient method. In 
particular, we assume that each agent receives and 
sends only quantized estimates, i.e., vectors whose 
entries are integer multiples of l/Q. At time k, 
an agent receives quantized estimates xj{k) from 
some of its neighbors and updates according to the 
following rule: 



xf{k + l) 



aij{k)xf{k) - adi{k) 



(9) 



where di{k) is a subgradient of fi at x^{k), and [y\ 
denotes the operation of (componentwise) rounding 
the entries of a vector y to the nearest multiple of 
l/Q. We also assume that the agents' initial estimates 
a:^(0) are quantized. 

'*The assumption maxi<i<„ ||xi(0)|| < aL in this theorem is 
not essential. We use this assumption mainly to present a more 
compact expression for the bound on the convergence time. A 
bound that explicitly depends on ||zi(0)|| can be obtained by 
following a similar line of analysis 



To analyze the proposed method, we find it useful 
to rewrite Eq. (|9|l as follows; 

n 

xfik + l)^Yl a^J{k)xJ{k) - adr{k) - e,(fc + 1), 

(10) 

where the error vector ei{k + 1) is given by 

n 

ei{k + l) = Yl a^j{k)xf{k) - ad,{k) - xf{k + 1). 

(11) 

Thus, the method can be viewed as a subgradient 
method with external (possibly persistent) noise, rep- 
resented by ei{k + 1). Due to the rounding down to 
the nearest multiple of l/Q, the error vector (fc + 1) 
satisfies 

< ej(fc + 1) < ^ 1, foralHandfc, (12) 

where the inequalities above hold componentwise. 

Using the transition matrices $(fc, s), we can 
rewrite the update equation (ITOl i as 



k n 

-aj2Y.^ms)hd,{s-i) 

S=l J = l 

k n 

-^^[$(fc,.)],,e,(s) 

s=l 3 = 1 

-ad,(fc) - ej(fc + 1). (13) 

In addition, we consider a related stopped model, 
where after some time k, the agents cease computing 
subgradients dj{k), and also after time fc + 1 stop 
quantizing (so that they can send and receive real 
numbers). Thus, in this stopped model, we have 

di{k) — and ei{k + 1) = 0, for all i and k > k. 

Let {xi{k)}, i = 1, . . . , n be the sequences gener- 
ated by the stopped model, associated with a particu- 
lar time fc. In view of the preceding relation, we have 
for each i, 

S:i{k)^xf{k) forfc<fc, 
and for fc > fc + 1, 



.(fc) = ^[$(fc,0)].,xf(0) 



k n 



S = l J = l 

k n 



S = l J = l 



Using the result of Corollary \T\ we can show that the 
stopped process converges as /c ^ oo. In particular, 
we have the following result. 

Lemma 2: Let Assumptions [T] and |2] hold. Then, 
for any i and any A; > 0, the sequence {xi{k)} 
generated by Eq. (fl4] | converges and the limit vector 
does not depend on i, i.e.. 



lim Xi{k) = y{k) 



for all i and k. 



Furthermore, for the limit sequence y{k), we have: 
(a) For all k, 

n 1 ^ 

j/(A:+l) = 2/(fc)-^^d;(fc)-- Y.e,{k+1). 



(b) When the subgradient norms are uni- 

formly bounded by some scalar L [cf. Eq. 
(|6]l] and the agents' initial values are such that 
maxj 11x^(0)11 < aL, then for all i and k, 



\\x'f{k)~y{k)\\ < 



2 aL 



Q 



nB 



where /? = 1 



^ and m is the dimension of 



the vectors xf. 

Proof: By Corollary [T] for any s > 0, the 
entries [$(fc,s)]y converge to l/n, as fc ^ oo. By 
letting fc ^ oo in Eq. (fT4l i. we see that the limit 
limfe_>oo Xi{k) exists and is independent of i. Denote 
this limit by y{k), and note that it is given by 

k n 



n — ' n — ' — ' 

J = l s = l j=l 

^ k n 

s=l j=l 

From the preceding relation, applied to different val- 
ues of k, we see that 

n 1 ^ 

y{k + 1) = y{k) -^J2d,{k)--Y^ e,{k + 1). 



This establishes part (a). 

Using the relations in Eqs. (fT3] l and ( fTSl l, and the 
subgradient boundedness, we obtain for all k. 



■(fc)-2/(fc)ll<E 

fc— 1 n 
k— 1 n 

+EE 

s=l 3 = 1 



1 



[1>(fc,0)], 



mk,s)],,~- 



|ej(s)|| 



n 

-2aL+||e,(fc)l| + -^||e,(fc)|j 



i=i 



By using Corollary [T] we have for all i and j, and 

any k > s, 



c.?(fc)-y(fc)||<E/3 



fc— 1 ?i 



M-2 



+ a^EE/^^ 

A;— 1 n 

+ EE/^^'^^-^ii^.wii 

s=l j=l 

1 " 

+ 2aL+||e,(fc)i| + -El|e,(fc)ll- 
" i=i 

Since ej(fc) <1/Q [cf. Eq. (fT2b1. we have 



< —7:;- for all i and fc. 

From the preceding two relations, and the inequality 
maxj 11 (0)11 < aL, we obtain for all i and k, 

\\xf{k)-y{k)\\ < aLn/3r^l-2 



k-l 



k-1 



s=l ^ s=l 



+2aL + 2 



By using E'ro/?^"^^"' < -s EZoP^"^^-'^ 



and 



B 



r=0 r=0 

we finally obtain 

\\xf{kyyik)\\<2(aL + 



m \ / nB 
1 



/3(l-/3) 



According to part (a) of Lemma |2] the vectors 
y{k) can be viewed as the iterates produced by the 
"fictitious" centralized algorithm: 

y{k + 1) = y{k) - - E dm - - E ^^(fc + 1)' 



i=i 



(16) 

which is an approximate subgradient method with 
persistent noise: The direction J2]=idjik) is an 
approximate subgradient of the objective function 
/ because each vector dj{k) is a subgradient of 
fi at xf (k) instead of at y{k). The error term 
(l/n) E"^j^ ej(fc + 1) can be viewed as the noise 



experienced by the whole system. The noise is per- 
sistent since the magnitudes of the errors ej{k) are 
non-diminishing. 

We now focus on establishing an error bound 
for the function values at the points y{k) of the 
stopped process of Eq. ( fT6] l, starting with y(0) = 
i ^^(0)' ™d with the direction dj{k) being a 

subgradient of fj at xj{k) for all j and k. The process 
y{k) is similar to the stopped process analyzed in 
[15], defined using Xj{k) in place of xj{k). Thus, 
using the same analysis as in [15] (see Lemma 5 
therein), we can show the following basic result. 

Lemma 3: Let Assumptions [T| and |2] hold, and 
assume that the set X* of optimal solutions of 
problem ([T]i is nonempty. Let the sequence {y{k)} 
be defined by Eq. (ITSI l, and the sequences {x'^{k)} 
for j S {!,..., n} be generated by the quantized 
subgradient method (|9]l. Also, assume that the sub- 
gradients are uniformly bounded as in Eq. (|6]l, and 
that maxj ||x^(0)|| < aL. Then, the average vectors 
y{k) defined as in Eq. (|7]i, satisfy for all fc > 1, 

f{y{k)) < f +- 



where 



C= 1 



2ak 



8nCi 
aL 



Ci= laL 



Q 



nB 



/3(1 - 13) 



(i^l-^^nAy{0) = lY:UxJ{0). 

Proof: Using the same line of analysis as in the 
proof of Lemma 5 in [15], we can show that for all 
k, 

dist^(y(fc + 1), X*) < dist^(2/(fc), X*) 



+ 



n 



2a 



IM.(fc)ll + ll.9,(fc)ll 



[fiyik))-r] 



■El 



where gj{k) is a subgradient of fj at y{k). By using 
the subgradient boundedness, we further obtain 

dist2(y(fc + 1), X*) < dist^ {y{k), X*) 



AaL 



E ii^^^'^) " ^. 

n n 
By using Lemma |2tb), we have 

dist^(y(fc + 1), X*) < dist^ {y{k), X*) 

+8aLC,~ — [f{y{k))-f*] + ^., 
n n 



where Ci = 



{aL + ^ 



riB 



Therefore, 



f{y{k))<f* + ^+AnLCi 



+ — {dist'(y(k),X*) - dist2(y(fc + 1),X*)) , 
2a 

and by regrouping the terms and introducing 
C ^1 + we have for all k, 

f{y{k)) <f* + ^ 

+ ^ {dist\y{k), X*) - dist2(y(fc + 1), X*)) . 

By adding these inequalities for different values of 
k, and by using the convexity of /, we obtain the 
desired inequality. ■ 

Assuming that the agents can store real values 
(infinitely many bits), we consider the time-average 
of the iterates xf{k), defined by 

k-l 

fcE^?('^) iork>l. 

Using Lemma [3j we have the following result. 

Theorem 2: Under the same assumptions as in 
Lemma[3] the averages xf (k) of the iterates obtained 
by the method ^ satisfy, for all i, 

where C, Ci, and y{0) are as in Lemma |3] 

Proof: By the convexity of the functions fj, we 
have, for any i and k, 

n 

fiifik)) < fim) +Y.9^Ak)\s:?{k) ~ m), 

where gij{k) is a subgradient of fj at xf{k). Then, 
by using the boundedness of the subgradients and 
Lemma |2|b), we obtain for all i and k, 

/(xf(fc))</(y(fc)) + 2nLCi, 
with Ci = {aL + ^] {1 + ,57?^ ) . The result 



follows by using Lemma |3[ 

When the quantization level Q is increasingly finer 
(i.e., Q oo), the results of Theorems |2] and [T] 
coincide. More specifically, when Q ^ oo, the 
constants Ci and C of Theorem |2] satisfy 



nB 

lim Ci=aL[l + — 



aLCi, 



lim C = 1 + — lim Ci = 1 + 8nCi, 

Q— >oo aL Q— >oo 



with Ci = 1 + ^(Y^gy- Thus, for the error term of 
Theorem |2] we have 

lim ( + 2nLCi] = —C + 2naL'Ci 

Q^oo \ 2 / ^ 

where C ~ l + SnCi and C\ = 1+ . Hence, in 

the limit as Q oo, the error terms in the estimate of 
Theorem |2] reduce to the error terms in the estimate 
of Theorem [T] 

IV. Conclusions 

We studied distributed subgradient methods for 
convex optimization problems that arise in networks 
of agents connected through a time-varying topology. 
We first considered an algorithm for the case where 
agents can exchange and store continuous values, and 
proved a bound on the convergence rate. We next 
studied the algorithm under the additional constraint 
that agents can only send and receive quantized 
values. We showed that our algorithm guarantees con- 
vergence of the agent values to the optimal objective 
value within some error. Our bound on the error high- 
lights the dependence on the number of quantization 
levels, and the polynomial dependence on the number 
n of agents. Future work includes investigation of the 
effects of other quantization schemes and of noise 
in the agents' estimates on the performance of the 
algorithm. 
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